生成式AI的多樣應用：從文字到圖像

2024 iThome 鐵人賽

DAY 12

生成式 AI

Semantic Kernel 的魔力-用.NET探索生成式應用系列第 12 篇

16th鐵人賽 semantic kernel ai 生成式ai

Ian

2024-09-25 21:48:08

668 瀏覽

分享至

到目前為止所展示的範例，都是文字生成類的，但是生成式AI（Generative AI）不僅僅能夠生成文字內容，還能夠生成圖像。例如：OpenAI 的 DALL-E 模型，可以讓使用者以 prompt 提示輸入然後產生圖片的輸出。而 Semantic Kernel 當然也支援 OpenAI/Azure OpenAI 的圖像生成，本篇就讓我們來體驗使用 Semantic Kernel 開發圖像生成的應用，有多麼簡單。

目前在 OpenAI / Azure OpenAI 服務，都有二個版本的DALL-E 模型可以用於生成圖像，分別是 DALL-E-2 模型以及 DALL-E-3 模型，這二個版本的模型能力上是有差別的

prompt : DALL-E-2 最大接收的字元數為1000，而DALL-E-3 最大接收的字元數為4000。
size : DALL-E-2 支援 256x256、512x512、1024x1024，而 DALL-E-3 支援 1024x1024、1792x1024、1024x1792。

此外，由於圖像生成會比一般文本生成來的費時，所以回應時間上會相對較慢，並且圖像的回應有base64以及url二種格式，其中url的有效性只有1個小時，在有效時間內可以使用這個url把檔案下載回來。

OpenAI 圖像生成

建立 kernel 並掛載使用 AddOpenAITextToImage 服務

var kernel = Kernel.CreateBuilder()
            .AddOpenAITextToImage(
                modelId: Config.openai_dall_modelId,
                apiKey: Config.openai_apiKey)
            .Build();

從 kernel 取得實作 ITextToImageService 介面的服務物件
Semantic Kernel 內建已有 OpenAI TextToImage 實作的服務類別。

var dallE = kernel.GetRequiredService<ITextToImageService>();

接著呼叫 GenerateImageAsync 方法
傳入 prompt以及想要的圖像尺吋參數

var imageDescription = "一鄉二里共三夫子，不識四書五經六義，竟敢教七八九子，十分大膽";
var image = await dallE.GenerateImageAsync(imageDescription, 1024, 1024);

取得生成圖像
生成後返回 URL，有效時間內（生成後1小時內）皆可透過URL下載。

Console.WriteLine(imageDescription);
Console.WriteLine("Image URL: " + image);

透過對話再生成圖像

上述案例是直接給予一個 prompt 進行圖像生成，接著來看如何使用對話歷程來生成圖像。

建立 kernel 並掛載使用 AddOpenAITextToImage、AddOpenAIChatCompletion 服務
由於這次要採用連續對話及對話歷程，因此 kernel 增加掛載 AddOpenAIChatCompletion 服務

var kernel = Kernel.CreateBuilder()
            .AddOpenAITextToImage(modelId: Config.openai_dall_modelId,
            apiKey: Config.openai_apiKey)
            .AddOpenAIChatCompletion(modelId: Config.openai_modelId,
            apiKey: Config.openai_apiKey)
            .Build();

取得實作 IChatCompletionService、ITextToImageService介面的服務
在 Semantic Kernel 均已內建支援 OpenAI 平台的實作服務類別

var completionService = kernel.GetRequiredService<IChatCompletionService>();
var dallE = kernel.GetRequiredService<ITextToImageService>();

建立 system prompt
system prompt 指定如何進行圖像的生成，例如先要求 LLM 模型進行使用者 prompt 優化，再以優化後的 prompt 進行圖像生成

var system_Prompt = 
"""
You're chatting with a user. Instead of replying directly to the user
provide the description of an image that expresses what you want to say.
The user won't see your message, they will see only the image. The system 
generates an image using your description, so it's important you describe the image with details.
""";

var chatHistory = new ChatHistory(system_Prompt);

連續對話生成圖像
這個方式需要注意 OpenAI API 呼叫速率的限制，以避免返回例外錯誤。

 var msg = "圖畫裡，龍不吟虎不嘯，小小書僮可笑可笑。棋盤內，車無輪馬無韁，叫聲將軍提防提防。";
chatHistory.AddUserMessage(msg);
Console.WriteLine($"User: {msg}");

var reply = await completionService.GetChatMessageContentAsync(chatHistory);
chatHistory.Add(reply);
var image = await dallE.GenerateImageAsync(reply.Content!, 1024, 1024);
Console.WriteLine($"Bot: {image}");
Console.WriteLine($"Img description: {reply}  \n\n");

msg = "十口心思，思君思國思社稷。八目尚賞，賞花賞月賞秋香。";
chatHistory.AddUserMessage(msg);
Console.WriteLine($"User: {msg}");
reply = await completionService.GetChatMessageContentAsync(chatHistory);
chatHistory.Add(reply);
image = await dallE.GenerateImageAsync(reply.Content!, 1024, 1024);
Console.WriteLine($"Bot: {image}");
Console.WriteLine($"Img description: {reply}  \n\n");

msg = "鶯鶯燕燕翠翠紅紅處處融融洽洽。雨雨風風花花葉葉年年暮暮朝朝。";
chatHistory.AddUserMessage(msg);
Console.WriteLine($"User: {msg}");
reply = await completionService.GetChatMessageContentAsync(chatHistory);
chatHistory.Add(reply);
image = await dallE.GenerateImageAsync(reply.Content!, 1024, 1024);
Console.WriteLine($"Bot: {image}");
Console.WriteLine($"Img description: {reply}  \n\n");

Azure OpenAI 圖像生成

除了 OpenAI 平台，Azure 也有提供 DALL-E 模型，使用上大同小異，差別的是 Kernel 掛載的服務更換一下即可。這裡直接示範透過連續對話生成圖像。

建立 kernel 服務
更換掛載 AddAzureOpenAITextToImage 及 AddAzureOpenAIChatCompletion，其餘程式碼都不需調整。

var kernel = Kernel.CreateBuilder()
            .AddAzureOpenAITextToImage(
                deploymentName: Config.aoai_dall_deployment,
                endpoint: Config.aoai_endpoint,
                apiKey: Config.aoai_apiKey)
            .AddAzureOpenAIChatCompletion(
                endpoint: Config.aoai_endpoint,
                deploymentName: Config.aoai_deployment,
                apiKey: Config.aoai_apiKey)
            .Build();

在 Azure OpenAI 服務上，基於負責任的AI原則，會滿常遇到生成內容被阻檔的現象（文本及圖像皆會），相較於 OpenAI 平台，Azure 平台是更嚴格的，因此如果發現在你的使用場景中，經常發生這種現象，解決方式就是在 Azure OpenAI 服務上自訂內容過濾，將過濾層級進行調整。(嘿...我知道你在想什麼，即便調到最寬鬆層級，你想的圖還是會被阻擋的喔)

生成內容被阻檔的錯誤

User: 圖畫裡，龍不吟虎不嘯，小小書僮可笑可笑。棋盤內，車無輪馬無韁，叫聲將軍提防提防。
Unhandled exception. Microsoft.SemanticKernel.HttpOperationException: HTTP 400 (invalid_request_error: content_policy_violation)

This request has been blocked by our content filters.
 ---> System.ClientModel.ClientResultException: HTTP 400 (invalid_request_error: content_policy_violation)

This request has been blocked by our content filters.
   at Azure.AI.OpenAI.ClientPipelineExtensions.ProcessMessageAsync(ClientPipeline pipeline, PipelineMessage message, RequestOptions options)
   at Azure.AI.OpenAI.Images.AzureImageClient.GenerateImagesAsync(BinaryContent content, RequestOptions options)
   at OpenAI.Images.ImageClient.GenerateImageAsync(String prompt, ImageGenerationOptions options, CancellationToken cancellationToken)
   at Microsoft.SemanticKernel.Connectors.OpenAI.ClientCore.RunRequestAsync[T](Func`1 request)

自訂內容篩選
進入Azure OpenAI Studio，建立自訂內容篩選。請注意這裡的 UI 常會被誤用，拉到最右邊才是允許 Low and Medium(白話：就是最寬鬆啦)，此外內容篩選是有區分為input、output的內容，可以根據需求獨立設定。
設定Azure OpenAI 部署的模型套用自訂內容篩選